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ABSTRACT 

In this paper, we address the problem of distributed sparse re- 
covery of signals acquired via compressed measurements in a 
sensor network. We propose a new class of distributed algo- 
rithms to solve Lasso regression problems, when the com- 
munication to a gateway node is not possible, e.g., due to 
communication cost or privacy reasons. More precisely, we 
introduce a distributed iterative soft thresholding algorithm 
(DISTA) that consists of three steps: an averaging step, a sub- 
gradient step, and a soft thresholding operation. We prove the 
convergence of DISTA in a network represented by a com- 
plete graph, and we show that it outperforms existing algo- 
rithms in terms of performance and complexity. 

Index Terms — Distributed compressed sensing, dis- 
tributed optimization, consensus algorithms, subgradient al- 
gorithms. 



1. INTRODUCTION 

Distributed compressed sensing [1] has recently emerged as 
a new research area that aims at decentralizing data acquisi- 
tion and processing in compressed sensing. The rationale is 
the following: if we consider a network of sensors that indi- 
vidually acquire compressed measurements of correlated sig- 
nals, we can expect a reduction in the number of measure- 
ments needed to obtain exact recovery. In fact, even if each 
sensor individually takes an insufficient number of measure- 
ments, reconstruction can be achieved by directing the whole 
network information to a single collection point, which in- 
cludes a decoder that can reconstruct the signals in a joint 
fashion. However, in large-scale networks, gathering all data 
at a single point can be prohibitive from the energy consump- 
tion point of view, and can also introduce delays, severely 
reducing the sensor network performance. In other appli- 
cations, agents providing private data may not be willing to 
share them |2 |. 
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In this work, we consider the problem of in-network pro- 
cessing of information that is not centrally available at a gate- 
way node, as formulated in [3|. Specifically, we assume that 
we are provided with a sensor network, in which sensors can 
store a limited amount of information, perform a low number 
of operations, and communicate under some constraints. Our 
aim is to study how to perform compressed acquisition and re- 
covery leveraging on these seemingly scarce resources, with 
no computational support from an external gateway node. As 
is the case of decentralized methods, the key point is to suit- 
ably exploit local communication among sensors and develop 
an iterative algorithm that spreads the information through the 
network. 

In particular, we propose a decentralized version of iter- 
ative thresholding methods Q, which basically consist of a 
subgradient step that seeks to minimize the Lasso functional, 
and a thresholding step that promotes sparsity. This is ob- 
tained by keeping the subgradient and thresholding steps, and 
adding a consensus step to share information among neigh- 
boring nodes. The reader can refer to |5|]9) for an overview 
on consensus optimization problems. 

As will be seen, the proposed distributed iterative soft 
thresholding algorithm (DISTA) not only does not require a 
centralized decoder, but also allows to dramatically reduce 
the number of measurements per sensor. In this paper we the- 
oretically prove its convergence in fully connected networks 
and numerically verify its good performance, comparing it 
with that of existing methods, such as simultaneous orthog- 
onal matching pursuit (SOMP) [ 1 1 and alternating direction 
method of multipliers (ADMM) | 



10]. 



2. PROBLEM FORMULATION 
2.1. Notation 

Throughout this paper, we use the following notation. We 
denote column vectors with small letters, and matrices with 
capital letters. Given a matrix X, X T denotes its transpose 
and (X) v (or x v ) denotes the v-th column of X. We consider 
R" as a Banach space endowed with the following norms: 



p = 1,2. 




For a rectangular matrix X G M. mxn , we consider the Frobe- 
nius norm, which is defined as follows: 
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A symmetric graph is a pair Q = (V, £) where V is the 
set of vertices, and £ C V x V is the set of edges with the 
property that (i, i) E £ for all i £ V and (i, j) G £ implies 

(j, z) e f . 

A matrix with non-negative elements P is said to be 
stochastic if X^ev = ^ ^ or ever y * G V. Equivalently, P 
is stochastic if PI = 1. The matrix P is said to be adapted 
to a graph Q = (V, £) if P,, t „ = for all (w, v) $ £. 

2.2. Model and assumptions 

We consider a sensor network, whose topology is represented 
by a graph Q = (V, £). We assume that each node v E V 
acquires linear measurements of the form 



y v — A v xq + 



(1) 



where xq G R" is fc-sparse a signal (i.e., the number of its 
nonzero components is not larger than k), G K m is an ad- 



Let us collect the measurements in the vector y = 
(yj, . . . , 2/y) T and let A be the complete sensing matrix 
A = (Aj, Al) T . Given ar(0), iterate for t G N 

x(t + 1) = rix(x(t) + TA T (y - Ax)) 

where r is the stepsize in the direction of the steepest de- 
scent. The operator r\ is a thresholding function to be applied 
elementwise, i.e. r\\(x) — sgn(x)(|a;| — A) if |x| < A and 
rj\(x) = otherwise. 

The convergence of this algorithm was proved in |15), un- 
der the assumption that ||A|| 2 < 1/r. 

3. PROPOSED DISTRIBUTED ALGORITHM 

As has been said, transmitting all data collected in a sensor 
network to a centralized unit for joint decoding is not an ef- 
ficient approach. In this work we propose a distributed iter- 
ative algorithm to approximate Xq, in which the agents only 
exchange information with their nearest neighbors at each it- 
eration, without any central coordination. 

In particular, we describe a family of simple, easy to 
implement, relaxed subgradient thresholding methods, and 
prove their convergence. 



ditive noise, and A v G K.' 
projection operator. If the measurements taken by all sensors 
were available at once in a single collection point that per- 
forms joint decoding, a solution to this problem would be to 
solve the basis pursuit denoising or Lasso problem 1 11 12). 
The Lasso refers to the minimization of the convex function 
J : R n -> K defined by 



(with n » to) is a random 3.1. A consensus-based reformulation of the Lasso 



2A 



J(x,X) :=y2\\y v - A v x\\ 2 2 + — \\x\\ 



(2) 



where A > is a scalar regularization parameter that is usu- 
ally chosen by cross validation [ 13] and r > 0. Let us denote 
the solution of (2) as 



We recast the optimization problem in |2]) into a separable 
form which facilitates distributed implementation. The goal 
is to split this problem into simpler subtasks executed locally 
at each node. 

Let us replace the global variable x in |2| with local 
variables {x v } v& \>, representing estimates of Xq, provided 
by each node. While the conventional centralized Lasso 
problem attempts to minimize J(x,a), we recast the dis- 
tributed problem as an iterated minimization of the functional 
T : M" x l v l i — ► R+ defined as follows 



>Z|V|J 
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x = x(X) = argmin J{x; A). 



(3) 
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(4) 



This optimization problem is shown to provide an approxima- 
tion with a bounded error, which is controlled by a fl4) . 

A large amount of literature has been devoted to devel- 
oping fast algorithms for solving the optimization problem 
(|2]l and characterizing the performance and optimality condi- 
tions. We refer to |4| for an overview of these methods. 

2.3. Iterative soft thresholding 

A popular approach to solve the optimization problem in |2| 
is the iterative soft thresholding algorithm (ISTA). ISTA is 
based on moving at each iteration in the direction of the steep- 
est descent and thresholding to promote sparsity fT5) . 



where P = [P VlW ] v ,weV i s a stochastic matrix adapted to the 
graph Q, and 7 G (0, 1) is a parameter. 

By minimizing T, each node seeks to recover the sparse 
vector xq from its own linear measurements, and to enforce 
agreement with the estimates calculated by other sensors in 
the network. It should also be noted that if there is consen- 
sus, in the sense x v = x for all v G V, then ^(x, . . . , x) — 
J(x, a). 

Note that 7 can be viewed as a temperature parameter; 
as 7 decreases, estimates x v associated with adjacent nodes 
become increasingly correlated. Let us denote {x^}„ 6 v 
the solution of (HI). If Q is connected, then we expect that 



lim-. 



x, Vw G V. 



3.2. DISTA: algorithm description 

The information state of a node v at time t, denoted as x v (t), 
is an estimate of an optimal solution to the problem d3. The 
update at time t + 1 is obtained by combining this current 
estimate with the ones received from some of the other agents. 

Given a strongly connected symmetric graph Q = (V, £), 
let P be the Metropolis random walk transition matrix (see 
||T6|):if^j 

« \ (maxIdeg^ + Mcg^ + l})- 1 H(i,j)eS 

where deg(i) denotes the degree (the number of neighbors) 
of unit i in the graph Q; and Pa = 1 — Ylj^i Each node 
v has a message stored in its memory at time t, denoted as 

x v (t). 



DISTA: Given 


x v (0) = 


0, iterate for t G N 


x v (t) = 










u v (t) = 


-- x v (t) 4 


rAl(y v - A v x v (t)), 


x v (t + l) = 


= v* ((i - 


- j)x v (t) +ju v (t)) . 


with 7 e (0, 1) 


and a = 


A/(|V| 7 ). 



It should be noted that if |V| = 1, DISTA coincides with 
ISTA. 

3.3. Discussion and comparison with related work 

A few contributions towards distributed reconstruction are 
available in the literature. 

In particular, simultaneous orthogonal matching pursuit 
(SOMP) assumes that each sensor measures a different signal, 
but all signals have a common sparse support. The algorithm 
first estimates the support by averaging the information held 
by the nodes, then runs an individual recovery procedure at 
each node. We notice that the averaging step could be easily 
performed in a distributed way on networks with communi- 
cation constraints, using classical consensus methods (5). On 
the other hand, in SOMP, after the averaging step, the signal is 
recovered separately by the sensors, while in DISTA coopera- 
tion is exploited also for reconstruction. Even if the measured 
signal is assumed to be common, this cannot be practically 
used in SOMP, as the recovery procedure is not iterative. 

Another algorithm for distributed sparse linear regression 
is the ADMM ppO], which tackles the problem in Q by in- 
troducing dual variables and minimizing the augmented La- 
grangian in a iterative way with respect to the primal and dual 
variables. The algorithm entails the following steps for each 
t £ N: agent v receives the local estimates from its neighbors, 
uses them to evaluate the dual price vector and the new esti- 
mate via coordinate descent and thresholding operation. The 



tricky point of this algorithm is the inversion of an n x n ma- 
trix at each node, which may be computationally demanding 
for very large n. Compared to ADMM, the updates in DISTA 
are extremely simple and involve just scaling and addition of 
vectors and soft thresholding operations. 

4. MAIN CONTRIBUTION 

4.1. Theoretical results 

Let us define the operator V : M" x l v l i — > M rix l v l where 

(TX) V := rj a [(1 - j)x v + j(x v (t) + tA t v { Vv - A v x v {t)))\ 

and v E V. DISTA can be rewritten as 

X[t+1) = TX(t) 

with any initial condition X(Q). 

The following theorem ensures the convergence of DISTA. 

Theorem l.IfQ is complete and r < || A v \\ for all v e V, 
the following hold for any initial choice ^(0): 

1. there exists X* G M" x l v l such that TX* = X* ; 

2. DISTA produces a sequence {X{t)} t &i such that 

lim^oc, \\X(t) - X*\\ F = 

3. the limit point X* is a minimizer of F. 

Sketch of the proof. Following JT5), the sequence of the 
{X(i)} te pj is proved to converge to a fixed point of T, by 
applying the Opial's Theorem [17]; then the equivalence of 
fixed points T and minimizers of J~(X) is obtained by stan- 
dard variational techniques. For brevity, the complete proof 
is deferred to (18J. □ 

4.2. Numerical results 

To demonstrate the performance of DISTA, we conduct a 
series of experiments for the complete graph architecture 
and for a variety of total number of measurements. We 
consider the complete topology where Pjj = for every 
i, j = 1,...,N. For a fixed n, we construct random recovery 
scenarios for sparse vector x . For each n, we vary the num- 
ber of measurements m per node and the number of nodes 
in the network. For each (N,m, |V|) triple, we repeat the 
following procedure 50 times. 

A signal is generated by choosing k nonzero components 
uniformly among the n elements and sampling the entries 
from a Gaussian distribution N(0, 1). Matrices (A v ) ve \> 
are sampled from the Gaussian ensemble with m rows, n 
columns, null mean and variance — . We fix n = 150, 
k = 15, a = 10~ 4 , and t = 0.02. 

In the noise-free case, we show the performance of DISTA 
in terms of reconstruction probability as a function of the 




Fig. 1. Noise-free case: performance analysis of DISTA for Fig. 
complete graph, n = 150, k = 15. n 



2 Noise-free case: 
150, k = 15. 



DISTA vs SOMP, complete graph, 



number of measurements (see Figure[T]i. In particular, we de- 
clare xq to be recovered if J2 veV ll^o — ^lll/W^I) < 
The color of the cell in the figures reflects the empirical recov- 
ery rate of the 50 runs (scaled between and 1). White de- 
notes perfect recovery in all experiments, and black denotes 
failure for all experiments. It should be noted that the number 
of total measurements m|V|, which are sufficient for success- 
fully recovery, remains constant. 

In Figure [2] the probability of recovery of DISTA and 
SOMP (obtained with 50 runs) are compared as a function of 
the number of measurements per sensor. The curves are ob- 
tained for different number of sensors. SOMP is assumed to 
know the sparsity value k — 15. We immediately notice that 
the number of measurements needed for success by DISTA 
is smaller. Indeed, for SOMP, a number of measurements not 
smaller than k is a necessary (but not sufficient) condition for 
good recovery. This is evident in Figure [2] which shows that 
there are no recovery occurrences below k = 15, while above 
this threshold the probability of recovery increases with the 
dimension of the network. This is a substantial drawback that 
DISTA is able to overcome. 

Finally, let us consider the noise case. In Figure [3] the 
mean square error 



,x10 



MSE = 



\Xo 



n\V\ 



averaged over 50 runs, is plotted as a function of the signal- 
to-noise ratio 

qN E[E„ ev IM 2 ] 

E[E, ev ||CII 2 ] 

for both DISTA and SOMP. The number of sensors is |V| = 
10. For equal SNR, the MSE decreases as the number m of 
measures for node increases. It should be noted that DISTA 
performs better then SOMP: m = 6 measures are sufficient 
for DISTA to obtain a MSE lower than the one obtained by 
SOMP with m = 18 measures. 
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Fig. 3. Noise case: DISTA vs SOMP, complete graph, n 
150, k = 15, |V| = 10. 



In results not reported here for brevity, we have observed 
that the performance of DISTA is not strongly affected by the 
graph topology; this suggests that decentralization is not a 
drawback. 

5. CONCLUDING REMARKS 

The problem of distributively estimating sparse signals from 
compressed measurements in sensor networks with limited 
communication capability is studied. In particular, the DISTA 
algorithm has been proposed. The main contribution includes 
the proof of convergence of the algorithm to a local minimum 
of the distributed Lasso estimator. We also show simulation 
results showing that DISTA significantly outperforms SOMP. 
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